Caspian Sea
- Europe > Moldova (1.00)
- Asia > Middle East > Israel (0.68)
- Atlantic Ocean (0.45)
- (20 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Sports > Hockey (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- (16 more...)
Counterfactual Memorization in Neural Language Models
Zhang, Chiyuan, Ippolito, Daphne, Lee, Katherine, Jagielski, Matthew, Tramèr, Florian, Carlini, Nicholas
Modern neural language models widely used in tasks across NLP risk memorizing sensitive information from their training data. As models continue to scale up in parameters, training data, and compute, understanding memorization in language models is both important from a learning-theoretical point of view, and is practically crucial in real world applications. An open question in previous studies of memorization in language models is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing "common" memorization such as familiar phrases, public knowledge or templated texts. In this paper, we provide a principled perspective inspired by a taxonomy of human memory in Psychology. From this perspective, we formulate a notion of counterfactual memorization, which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We further estimate the influence of each training example on the validation set and on generated texts, and show that this can provide direct evidence of the source of memorization at test time.
- Europe > Moldova (1.00)
- Asia > Middle East > Israel (0.68)
- Atlantic Ocean (0.45)
- (22 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Sports > Hockey (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- (17 more...)
How artificial intelligence is shaking up the oil and gas industry
The Azeri-Chirag-Deepwater Gunashli (ACG), a sprawling complex of offshore oil fields 60 miles off Azerbaijan's capital Baku, is causing somewhat of a headache for BP's head of technology. "We have huge production in Azerbaijan of wells that are quite prone to producing sand, and sand if it's produced in high quantities from our oil wells can do damage to the metalwork and also choke back the production," says David Eyton. The ACG, which pumps out an average of 584,000 barrels of oil per day, is a prized asset for BP, and any hold ups could cost the company dearly. But the man leading BP's technology revolution think he has a solution: artificial intelligence (AI).